model_selection_diagnostics

Evaluation of Social Vulnerability Impact on Shigella Incidence and Racial Disparities

Original Research Objective: Are there residual effects of race on the incidence of Shigella after controlling for SVI?

Alternate Research Objective: Does the impact of SVI on Shigella incidence vary by race?

Primary outcome: New cases of Shigella, offset by population

Primary predictor: SVI quartile

Additional covariates: Age group, year (grouped by svi version), Urban-rural status

Libraries

library('glmmTMB')
library('lme4')
library('ggplot2')
library('ggthemes')
library('data.table')
library('DHARMa')
library('tidyverse')
library('kableExtra')

Data characteristics

The final dataset includes a row for each unique combination of census tract, time period, age group, sex, and race. The outcome variable, Cases, is the aggregate number of cases for that combination. The Pop variable corresponds to the underlying population total in that category. Because population data was not available for cross-tabluations of race and ethnicity, race and ethnicity case and population totals have been calculated separately (each race category includes both Hispanic and non-Hispanic categories. Records have been merged with the SVI data that most closely corresponds to the year of diagnosis and with the NCHS urban-rural classification codes.

Data Preparation

Pacific Islander rates are very sparse and unstable, so I’m going to combine Asian and Pacific Islander records into and Asian-PI category

dat[, 'Race' := ifelse(Race=='P', 'A', Race)]
dat<-dat[, list(Cases=sum(Cases), Pop=sum(Pop)), 
         by=list(GEOID, sviyear, AgeGroup,  Race, State, County, RPL_THEMES, quartile, UrCode)]

Factor covariates and set desired reference category.

dat[, c('Race', 'UrCode', 'quartile', 'AgeGroup') := list(
            factor(Race, levels=c('W', 'B', 'A', 'I',  'M', 'O'),
                         labels=c('White', 'Black', 'Asian-PI', 'AI-AN', 'Multi', 'Other')),
            factor(UrCode, levels=c(1:6), labels=c('lrg cen met', 'lrg frg met', 'med met', 'sml met', 'micro', 'non-core')),
            factor(quartile, levels=1:4),
            factor(AgeGroup, levels=c('0-4', '5-14', '15-44', '45+')))]

I’m also going to create a county-level dataset to compare to the census tract level dataset. This will become important because I’m forced to drop out census tract records with zero population and tracts with more cases than population. Because the numerator (Cases) and denominator (Pop) are coming from two different data sources and because the denominator is based on a sample with an associated MOE, there are quite a few records with mismatched data in the Asian-PI, Other, and Black categories. Exclusion of these records may result in biased results. The county dataset uses a population-weighted mean RPL_THEMES, but if I decide to use counties as my unit of analysis, I can use the actual county-level SVI and population estimates.

rpl_county<-dat[, list(RPL_THEMES=sum(Pop*RPL_THEMES, na.rm=T)/sum(Pop)), by=list(County, sviyear)]
rpl_county[, quartile := cut(RPL_THEMES, breaks=c(0, .25, .5, .75, 1), include.lowest=T, labels=F)]

cntydat<-dat[,list(Cases=sum(Cases), Pop=sum(Pop)), by=list(County, sviyear, AgeGroup, Race, State, UrCode)]
cntydat<-merge(cntydat, rpl_county, by=c('County', 'sviyear'), all.x=T)
rm(rpl_county)

View data

Census tract-level dataset

head(dat)
         GEOID sviyear AgeGroup     Race State County RPL_THEMES quartile
1: 06001400100    2000      0-4 Asian-PI    CA  06001      0.029        1
2: 06001400100    2000      0-4    Black    CA  06001      0.029        1
3: 06001400100    2000      0-4    AI-AN    CA  06001      0.029        1
4: 06001400100    2000      0-4    Multi    CA  06001      0.029        1
5: 06001400100    2000      0-4    Other    CA  06001      0.029        1
6: 06001400100    2000      0-4    White    CA  06001      0.029        1
        UrCode Cases Pop
1: lrg cen met     0  42
2: lrg cen met     0  14
3: lrg cen met     0   0
4: lrg cen met     0  26
5: lrg cen met     0   0
6: lrg cen met     0 126

County-level dataset

head(cntydat)
   County sviyear AgeGroup     Race State      UrCode Cases   Pop RPL_THEMES
1:  06001    2000      0-4 Asian-PI    CA lrg cen met     6 41064  0.5110268
2:  06001    2000      0-4    Black    CA lrg cen met     1 29908  0.5110268
3:  06001    2000      0-4    AI-AN    CA lrg cen met     0  1214  0.5110268
4:  06001    2000      0-4    Multi    CA lrg cen met     0 21442  0.5110268
5:  06001    2000      0-4    Other    CA lrg cen met     5 26036  0.5110268
6:  06001    2000      0-4    White    CA lrg cen met    10 77092  0.5110268
   quartile
1:        3
2:        3
3:        3
4:        3
5:        3
6:        3

Data quality checks

Number of records with more cases than population by race

Race Tract County
White 39 0
Black 164 5
Asian-PI 105 7
AI-AN 63 4
Multi 58 1
Other 239 12

Percent of records with zero cases by race

Race Tract Tract_Percent County County_Percent
White 200989 93.8% 6074 62.5%
Black 207903 97.0% 8181 84.2%
Asian-PI 213504 99.6% 9266 95.4%
AI-AN 214074 99.9% 9536 98.1%
Multi 214021 99.8% 9466 97.4%
Other 212877 99.3% 9153 94.2%

There are 668 census-tract level records with more cases than population. This may be due to sampling error in the population data, which is sourced from the ACS for sviyears after 2005 or differential classification of individuals in the ACS data compared to FoodNet. Comparitively, there were 29 county-level records with more cases than population.

Ratio of variance to mean of cases by race, and year for census-tract and county level datasets

Distribution of selected geographic units by number of cases for all race groups

These plots shows distribution of tracts and counties by number of cases for each race category for all years and age groups

Note

I have records with zero population so including population counts as an offset (i.e., log(0)=-inf error). Therefore, I’m dropping zero population counts and including the log of pop as an offset.

ctdat<-dat[Pop>0 & Cases <= Pop & !is.na(Race), ]
cntydat<-cntydat[Pop>0 & Cases <= Pop & !is.na(Race), ]

I’m creating a boolean variable so classify a tract or county having one or more cases in a specific category. This value will be used as the outcome in the logistic regression models below. I’m also going to create a variable for sviyear centered around the mean to help model fit and convergence.

ctdat$HasCases<-ifelse(ctdat$Cases>0, 1, 0)
cntydat$HasCases<-ifelse(cntydat$Cases>0, 1, 0)

ctdat$CenteredYear<-ctdat$sviyear-mean(ctdat$sviyear)
ctdat$sviyear<-factor(ctdat$sviyear)

cntydat$CenteredYear<-cntydat$sviyear-mean(cntydat$sviyear)
cntydat$sviyear<-factor(cntydat$sviyear)

Bivariate associations

Percent of tracts with one or more cases by SVI quartile and Race

Percent of counties with one or more cases by SVI quartile and Race

Shigella Rate per 100K by SVI Quartile and Race among Tracts One or More Cases

Shigella Rate per 100K by SVI Quartile and Race among Counties One or More Cases

Log(Shigella Rate per 100K) by SVI Score among Tracts One or More Cases by Age Each Race Category

Log(Shigella Rate per 100K) by SVI Score among Counties One or More Cases by Age Each Race Category

It looks like the effect of SVI score (RPL_THEMES) on Shigella incidence among counties with 1+ cases might be closer to a quadratic trend. I’ll create a new variable rpl_themes2 to reflect the quadratic effect of time.

cntydat[, rpl_themes2 := RPL_THEMES^2]

Multivariate Analysis

I’m going to stratify my analysis by age group

ctdat<-ctdat %>% split(.$AgeGroup)
cntydat<-cntydat %>% split(.$AgeGroup)

Logistic Regression Model Compared to Binomial Portion of Hurdle Model

Starting off with a very simple approach, I’m modeling the log(odds) of a county having one or more Shigella cases using generalized linear mixed-effects model (GLMM) from the lme4 package. This model allows me to include random intercepts for State. The the glmer function produces practically equivalent results as the glmmTMB package, but slight differences are present due to the Laplace Approximation used by glmmTMB.

Here, instead of including the log(Population) as an offset term as I would if modeling a rate, I’m just controlling for it.

mod.log0<-glmer(HasCases ~ quartile + log(Pop) + (1|State), family=binomial(link='logit'),    data=cntydat$`0-4`)

This model is a mirror image of the binomial portion of a hurdle model. The binomial portion of the hurdle model models the log odds of a county having no cases whereas the logistic model models the log odds of a county having one or more cases. The only difference is the sign on the parameter estimates (and the extra conditional model in the hurdle model).

Equivalent hurdle model for comparison

hurdle.comp<-glmmTMB(Cases ~ quartile + offset(log(Pop)) + (1|State), ziformula= ~log(Pop) + quartile + (1|State), family=truncated_poisson, data=cntydat$`0-4`)

Log(odds) and confidence intervals from logistic regression

mod.log0.estimates<-round(cbind(na.omit(confint(mod.log0, method='Wald')), fixef(mod.log0)), 4)
colnames(mod.log0.estimates)[3]<-'Estimate'
mod.log0.estimates
              2.5 %  97.5 % Estimate
(Intercept) -9.8693 -8.6154  -9.2424
quartile     0.1681  0.3674   0.2677
log(Pop)     0.8470  0.9332   0.8901

Log(odds) and confidence intervals from hurdle regression

hurdle.comp.estimates<-round(confint(hurdle.comp), 4)
hurdle.comp.estimates<-hurdle.comp.estimates[grepl('zi.', row.names(hurdle.comp.estimates)),]
hurdle.comp.estimates
                               2.5 %  97.5 % Estimate
zi.(Intercept)                8.6150  9.8697   9.2424
zi.log(Pop)                  -0.9332 -0.8470  -0.8901
zi.quartile                  -0.3674 -0.1680  -0.2677
zi.Std.Dev.(Intercept)|State  0.3940  0.9976   0.6270

Full logistic regression model of log(odds) of having one or more cases

I will try out a few different models with RPL_THEMES (continuous SVI score) and Race as my main covariates of interest, including additional fixed and random effects.

County-Level Data Base model

mod.log1<-glmer(HasCases ~ RPL_THEMES + Race + log(Pop) + sviyear + (1|State), 
                family=binomial(link='logit'),  
                data=cntydat$`0-4`)

Add Urban-Rural covariate

mod.log2<-update(mod.log1, . ~ . + UrCode)

Add interaction term between race and SVI score Note, I’m updating the optimizer because this model will not converge with the interaction term.

mod.log3<-update(mod.log2, . ~ . + Race*RPL_THEMES, 
                 control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))

Census Tract-Level Data

Base model

mod.log1.1<-glmer(HasCases ~ RPL_THEMES + Race + log(Pop) + sviyear + (1|State), 
                family=binomial(link='logit'),  
                data=ctdat$`0-4` )

Add Urban-Rural covariate

mod.log2.1<-update(mod.log1.1, . ~ . + UrCode)

Add interaction term between race and SVI score Note, I’m updating the optimizer because this model will not converge with the interaction term.

mod.log3.1<-update(mod.log2.1, . ~ . + Race*RPL_THEMES, 
                 control=glmerControl(optimizer="bobyqa", optCtrl=list(maxfun=2e5)))

Change random effect of State to County nested within state

mod.log4.1<-update(mod.log3.1, . ~ . + (1|County:State))

View Fixed Effect Odds Ratios and 95% Confidence Intervals

Note

Note, the interpretation of race/svi parameters in Model 4 is different due to interaction term!.

Models without interaction term

Models WITH interaction term

Table of fixed effects: odds ratio (95% CI)

Covariate mod.log1 mod.log1.1 mod.log2 mod.log2.1 mod.log3 mod.log3.1 mod.log4.1
(Intercept) 0 (0-0) 0 (0-0) 0 (0-0) 0 (0-0) 0 (0-0) 0 (0-0) 0 (0-0)
RPL_THEMES 3.5 (2.16-5.67) 4.42 (3.93-4.98) 4.67 (2.63-8.28) 4.09 (3.63-4.62) 12.82 (6.22-26.41) 5.02 (4.29-5.88) 4.84 (4.12-5.69)
RPL_THEMES:RaceAI-AN NA NA NA NA 0.35 (0.04-2.9) 1.83 (0.57-5.85) 1.08 (0.35-3.32)
RPL_THEMES:RaceAsian-PI NA NA NA NA 0.12 (0.03-0.57) 0.27 (0.16-0.47) 0.25 (0.15-0.44)
RPL_THEMES:RaceBlack NA NA NA NA 0.09 (0.04-0.23) 0.66 (0.5-0.86) 0.61 (0.46-0.8)
RPL_THEMES:RaceMulti NA NA NA NA 0.34 (0.07-1.63) 0.47 (0.23-0.95) 0.44 (0.22-0.88)
RPL_THEMES:RaceOther NA NA NA NA 0.54 (0.16-1.8) 0.87 (0.5-1.51) 0.77 (0.44-1.34)
RaceAI-AN 1.39 (0.96-2.02) 0.73 (0.56-0.94) 1.09 (0.74-1.62) 0.74 (0.57-0.96) 2.17 (0.61-7.66) 0.46 (0.18-1.18) 0.68 (0.28-1.65)
RaceAsian-PI 1.68 (1.28-2.19) 0.59 (0.5-0.7) 1.36 (1.02-1.81) 0.6 (0.5-0.71) 4.24 (1.96-9.14) 1.19 (0.87-1.62) 1.22 (0.9-1.66)
RaceBlack 1.85 (1.55-2.21) 1.49 (1.39-1.6) 1.68 (1.4-2.02) 1.46 (1.35-1.57) 6.95 (3.88-12.42) 1.91 (1.57-2.32) 1.95 (1.6-2.38)
RaceMulti 0.42 (0.33-0.55) 0.21 (0.17-0.25) 0.35 (0.27-0.47) 0.21 (0.17-0.25) 0.69 (0.29-1.64) 0.33 (0.21-0.52) 0.34 (0.22-0.53)
RaceOther 1.49 (1.19-1.88) 0.63 (0.55-0.72) 1.25 (0.98-1.6) 0.64 (0.56-0.73) 1.89 (0.9-3.96) 0.69 (0.45-1.07) 0.76 (0.5-1.16)
UrCodelrg frg met NA NA 0.65 (0.47-0.89) 0.66 (0.6-0.73) 0.67 (0.49-0.92) 0.66 (0.6-0.72) 0.58 (0.39-0.88)
UrCodemed met NA NA 0.91 (0.65-1.27) 1.05 (0.95-1.17) 0.97 (0.69-1.35) 1.05 (0.95-1.17) 0.71 (0.46-1.09)
UrCodemicro NA NA 0.61 (0.43-0.86) 0.95 (0.85-1.06) 0.66 (0.46-0.93) 0.94 (0.84-1.05) 0.59 (0.39-0.9)
UrCodenon-core NA NA 0.48 (0.33-0.7) 0.65 (0.56-0.74) 0.53 (0.37-0.78) 0.64 (0.56-0.74) 0.42 (0.28-0.65)
UrCodesml met NA NA 0.7 (0.5-0.99) 0.99 (0.89-1.11) 0.75 (0.53-1.07) 0.99 (0.88-1.1) 0.6 (0.39-0.93)
log(Pop) 2.56 (2.42-2.7) 1.93 (1.87-2) 2.43 (2.28-2.58) 1.95 (1.89-2.02) 2.5 (2.34-2.66) 1.97 (1.9-2.04) 2.01 (1.95-2.09)
sviyear2010 0.9 (0.73-1.11) 1.25 (1.13-1.39) 0.96 (0.77-1.18) 1.26 (1.13-1.4) 0.93 (0.75-1.15) 1.24 (1.12-1.38) 1.24 (1.11-1.37)
sviyear2014 1.18 (0.96-1.46) 1.42 (1.28-1.57) 1.24 (1-1.53) 1.42 (1.28-1.58) 1.21 (0.98-1.49) 1.41 (1.27-1.56) 1.4 (1.27-1.56)
sviyear2016 1.74 (1.41-2.15) 1.38 (1.24-1.53) 1.75 (1.42-2.16) 1.39 (1.25-1.55) 1.74 (1.41-2.15) 1.38 (1.24-1.54) 1.38 (1.24-1.54)
sviyear2018 0.85 (0.68-1.05) 0.81 (0.72-0.91) 0.87 (0.7-1.08) 0.81 (0.72-0.91) 0.85 (0.69-1.06) 0.8 (0.72-0.9) 0.79 (0.71-0.89)

Model Fit Comparison and Diagnostics

County-Level Models

         npar    AIC    BIC  logLik deviance
mod.log1   13 6210.6 6306.4 -3092.3   6184.6
mod.log2   18 6190.0 6322.7 -3077.0   6154.0
mod.log3   23 6171.0 6340.5 -3062.5   6125.0

Census-Tract Level Models

           npar   AIC   BIC logLik deviance
mod.log1.1   13 35034 35165 -17504    35008
mod.log2.1   18 34890 35072 -17427    34854
mod.log3.1   23 34869 35101 -17412    34823
mod.log4.1   24 34186 34427 -17069    34138

Evaluate quantile residuals from full logistic model

Quantile residuals have been shown to be more appropriate for GLMM model diagnostics. The cumulative probability that an observation is less than or equal to the fitted value of a given distribution is determined and then used to find the corresponding value of the standard normal variate (i.e., quantile residual).

I’m going to test output from Model #2 (County-level model without interaction).

Model #4 appears to have the best fit among the census-tract level models (based on the AIC) but answers a slightly different question than the original research question (what is the effect of race after accounting for SVI on the incidence of Shigella?). Due to the inclusion of the interaction between race and SVI, Model #4 shows what the impact of SVI is for different race categories.

Simulate Residuals for Model #2

mod2simout<-simulateResiduals(fittedModel =  mod.log2, plot=F)

View simulated residuals

plot(mod2simout)

The Kolmogorov-Smirnov test shows a poor goodness of fit in Model #2, however this test is sensitive to Type I errors and because I have a large amount of data (n=11699), this result might not be too much of an issue. The QQ is nearly linear, suggesting that the overall distribution is acceptable. from I’ll go through some more diagnostics to evaluate if there are any obvious problems with the fit of Model #2.

Non-parametric test for overdispersion

testDispersion(mod2simout)


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 1.1414, p-value = 0.28
alternative hypothesis: two.sided

Test for zero-inflation

testZeroInflation(mod2simout)


    DHARMa zero-inflation test via comparison to expected zeros with
    simulation under H0 = fitted model

data:  simulationOutput
ratioObsSim = 0.97484, p-value = 0.288
alternative hypothesis: two.sided

Test for outliers

testOutliers(mod.log2)


    DHARMa outlier test based on exact binomial test with approximate
    expectations

data:  mod.log2
outliers at both margin(s) = 108, observations = 11699, p-value =
0.1311
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
 0.00757880 0.01113494
sample estimates:
frequency of outliers (expected: 0.00796812749003984 ) 
                                           0.009231558 

There is no significant difference in the observed frequency of outliers compared to expected.

Outlier Type
Observed 104
Expected 93

View simulated residuals plotted against covariates in Model #2

plotResiduals(mod2simout, form = na.omit(cntydat$`0-4`$RPL_THEMES))

plotResiduals(mod2simout, form = na.omit(cntydat$`0-4`)$Race)

plotResiduals(mod2simout, form = na.omit(cntydat$`0-4`)$sviyear)

plotResiduals(mod2simout, form = na.omit(cntydat$`0-4`)$UrCode)

plotResiduals(mod2simout, form = with(na.omit(cntydat$`0-4`), Pop))

Hurdle Model (County-Level Data)

I will create a list of hurdle poisson and hurdle negative binomial models to compare fit. Each model includes the log of population as an offset term and a random intercept for state and/or county nested in state in the conditional portion of the model. Different covariates are evaluated in both the conditional and the zero-inflation (binomial) portion of the model.

###Truncated Poisson Models

Null Model

modhur0<-glmmTMB(Cases ~ offset(log(Pop)) + (1|State), zi= ~1, family=truncated_poisson, data=cntydat$`0-4`)

Base Model, includes essential covariates

modhur1<-update(modhur0, . ~ . + Race + RPL_THEMES + sviyear)

Add urban-rural designation

modhur2<-update(modhur1, . ~ . + UrCode)

Add quadratic trend of RPL_THEME

modhur3<-update(modhur2, . ~ . +  (rpl_themes2))
#Compare models
hurdle.poisson.comparison<-anova(modhur0, modhur1, modhur2, modhur3)
hurdle.poisson.comparison[,c(1:5, 8)] %>% arrange(desc(AIC))
        Df   AIC   BIC  logLik deviance Pr(>Chisq)    
modhur0  3 19401 19423 -9697.6    19395               
modhur1 13 17891 17987 -8932.6    17865    < 2e-16 ***
modhur2 18 17732 17864 -8847.9    17696    < 2e-16 ***
modhur3 19 17731 17871 -8846.4    17693    0.08349 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Add covariates to binomial portion of the model It doesn’t appear that the quadratic effect of RPL_THEMES adds anything to the model, so I’m dropping out that term.

modhur3<-update(modhur2, ziformula = ~ . )

###Truncated negative binomial distribution

modhur0.1<-update(modhur0, family=truncated_nbinom2)
modhur1.1<-update(modhur1, family=truncated_nbinom2)
modhur2.1<-update(modhur2, family=truncated_nbinom2)
modhur3.1<-update(modhur3, family=truncated_nbinom2)

Hurdle model comparison

Conditional Formula ZI Formula Distribution AIC Covergence DF Dispersion Parameter
Cases ~ offset(log(Pop)) + (1 | State) 1 truncated_poisson 19401.19 Yes 3 1.0000000
Cases ~ offset(log(Pop)) + (1 | State) 1 truncated_nbinom2 16169.63 Yes 4 0.3671244
Cases ~ (1 | State) + Race + RPL_THEMES + sviyear + offset(log(Pop)) 1 truncated_poisson 17891.15 Yes 13 1.0000000
Cases ~ (1 | State) + Race + RPL_THEMES + sviyear + offset(log(Pop)) 1 truncated_nbinom2 15941.57 Yes 14 0.8542466
Cases ~ (1 | State) + Race + RPL_THEMES + sviyear + UrCode + offset(log(Pop)) 1 truncated_poisson 17731.83 Yes 18 1.0000000
Cases ~ (1 | State) + Race + RPL_THEMES + sviyear + UrCode + offset(log(Pop)) 1 truncated_nbinom2 15939.84 Yes 19 0.8946618
Cases ~ (1 | State) + Race + RPL_THEMES + sviyear + UrCode + offset(log(Pop)) (1 | State) + Race + RPL_THEMES + sviyear + UrCode truncated_poisson 15207.73 Yes 34 1.0000000
Cases ~ (1 | State) + Race + RPL_THEMES + sviyear + UrCode + offset(log(Pop)) (1 | State) + Race + RPL_THEMES + sviyear + UrCode truncated_nbinom2 13415.74 Yes 35 0.8946559

Extract model with best overall fit

final_mod<-models[[which.min(sapply(1:length(models),function(x)AIC(models[[x]])))]]

Final Model Summary

summary(final_mod)
 Family: truncated_nbinom2  ( log )
Formula:          Cases ~ (1 | State) + Race + RPL_THEMES + sviyear + UrCode +  
    offset(log(Pop))
Zero inflation:         ~.
Data: cntydat$`0-4`

     AIC      BIC   logLik deviance df.resid 
 13415.7  13673.6  -6672.9  13345.7    11664 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 State  (Intercept) 0.623    0.7893  
Number of obs: 11699, groups:  State, 10

Zero-inflation model:
 Groups Name        Variance Std.Dev.
 State  (Intercept) 0.4914   0.701   
Number of obs: 11699, groups:  State, 10

Dispersion parameter for truncated_nbinom2 family (): 0.895 

Conditional model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -9.80933    0.33093 -29.642  < 2e-16 ***
RaceBlack          0.82618    0.08007  10.318  < 2e-16 ***
RaceAsian-PI       0.04081    0.18447   0.221 0.824915    
RaceAI-AN          1.05289    0.29136   3.614 0.000302 ***
RaceMulti         -0.12911    0.21783  -0.593 0.553359    
RaceOther          1.25231    0.13677   9.156  < 2e-16 ***
RPL_THEMES         1.54030    0.33204   4.639 3.50e-06 ***
sviyear2010       -0.37077    0.11372  -3.260 0.001112 ** 
sviyear2014       -0.50732    0.11489  -4.416 1.01e-05 ***
sviyear2016       -0.14750    0.12475  -1.182 0.237033    
sviyear2018       -0.80049    0.13215  -6.057 1.38e-09 ***
UrCodelrg frg met -0.19967    0.12710  -1.571 0.116178    
UrCodemed met      0.05366    0.14435   0.372 0.710096    
UrCodesml met      0.07126    0.14953   0.477 0.633675    
UrCodemicro        0.23843    0.15083   1.581 0.113914    
UrCodenon-core     0.01036    0.17225   0.060 0.952049    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Zero-inflation model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -1.36073    0.29602  -4.597 4.29e-06 ***
RaceBlack          0.89433    0.07551  11.845  < 2e-16 ***
RaceAsian-PI       2.31655    0.11263  20.568  < 2e-16 ***
RaceAI-AN          3.34070    0.17441  19.154  < 2e-16 ***
RaceMulti          3.04576    0.12675  24.029  < 2e-16 ***
RaceOther          2.07071    0.09908  20.899  < 2e-16 ***
RPL_THEMES        -0.22986    0.25180  -0.913  0.36133    
sviyear2010       -0.85910    0.09795  -8.771  < 2e-16 ***
sviyear2014       -0.86220    0.09720  -8.871  < 2e-16 ***
sviyear2016       -0.62975    0.09956  -6.325 2.53e-10 ***
sviyear2018       -0.30792    0.10285  -2.994  0.00275 ** 
UrCodelrg frg met  2.03003    0.14596  13.908  < 2e-16 ***
UrCodemed met      2.00529    0.15319  13.090  < 2e-16 ***
UrCodesml met      2.50446    0.15693  15.959  < 2e-16 ***
UrCodemicro        2.77952    0.15325  18.137  < 2e-16 ***
UrCodenon-core     3.67074    0.16002  22.939  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I looks like I could probably drop the urban-rural designation from the conditional portion of the model and RPL_THEMES (linear and quadratic) from the binomial portion of the model.

Visualize simulated residuals from best fit model.

final_mod<-update(final_mod, . ~ . -UrCode, ziformula = ~ Race +  sviyear + UrCode)
summary(final_mod)
 Family: truncated_nbinom2  ( log )
Formula:          
Cases ~ (1 | State) + Race + RPL_THEMES + sviyear + offset(log(Pop))
Zero inflation:         ~Race + sviyear + UrCode
Data: cntydat$`0-4`

     AIC      BIC   logLik deviance df.resid 
 13757.9  13964.2  -6850.9  13701.9    11671 

Random effects:

Conditional model:
 Groups Name        Variance Std.Dev.
 State  (Intercept) 0.6177   0.7859  
Number of obs: 11699, groups:  State, 10

Dispersion parameter for truncated_nbinom2 family (): 0.854 

Conditional model:
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)  -10.03369    0.29856  -33.61  < 2e-16 ***
RaceBlack      0.83274    0.08002   10.41  < 2e-16 ***
RaceAsian-PI   0.01231    0.18446    0.07 0.946798    
RaceAI-AN      1.02626    0.29336    3.50 0.000468 ***
RaceMulti     -0.15835    0.21761   -0.73 0.466821    
RaceOther      1.25252    0.13841    9.05  < 2e-16 ***
RPL_THEMES     1.97767    0.25811    7.66 1.83e-14 ***
sviyear2010   -0.34589    0.11458   -3.02 0.002538 ** 
sviyear2014   -0.53601    0.11529   -4.65 3.33e-06 ***
sviyear2016   -0.17320    0.12544   -1.38 0.167347    
sviyear2018   -0.82201    0.13246   -6.21 5.45e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Zero-inflation model:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       -1.40412    0.14541  -9.656  < 2e-16 ***
RaceBlack          0.80737    0.07316  11.036  < 2e-16 ***
RaceAsian-PI       2.24654    0.10939  20.536  < 2e-16 ***
RaceAI-AN          3.26317    0.16826  19.394  < 2e-16 ***
RaceMulti          2.91850    0.12346  23.639  < 2e-16 ***
RaceOther          1.97697    0.09632  20.525  < 2e-16 ***
sviyear2010       -0.82224    0.09547  -8.613  < 2e-16 ***
sviyear2014       -0.82331    0.09476  -8.689  < 2e-16 ***
sviyear2016       -0.60128    0.09708  -6.193 5.89e-10 ***
sviyear2018       -0.29201    0.10032  -2.911   0.0036 ** 
UrCodelrg frg met  1.86586    0.13293  14.036  < 2e-16 ***
UrCodemed met      1.94684    0.14135  13.774  < 2e-16 ***
UrCodesml met      2.26486    0.14189  15.962  < 2e-16 ***
UrCodemicro        2.72396    0.13795  19.746  < 2e-16 ***
UrCodenon-core     3.43738    0.14241  24.138  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
simulationOutput <- simulateResiduals(fittedModel = final_mod, plot=F) 
plot(simulationOutput)
DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details

testDispersion(simulationOutput = simulationOutput, alternative ="less")


    DHARMa nonparametric dispersion test via sd of residuals fitted vs.
    simulated

data:  simulationOutput
dispersion = 0.99515, p-value = 0.656
alternative hypothesis: less
testOutliers(simulationOutput)
DHARMa:testOutliers with type = binomial may have inflated Type I error rates for integer-valued distributions. To get a more exact result, it is recommended to re-run testOutliers with type = 'bootstrap'. See ?testOutliers for details


    DHARMa outlier test based on exact binomial test with approximate
    expectations

data:  simulationOutput
outliers at both margin(s) = 143, observations = 11699, p-value =
1.547e-06
alternative hypothesis: true probability of success is not equal to 0.007968127
95 percent confidence interval:
 0.01031151 0.01438313
sample estimates:
frequency of outliers (expected: 0.00796812749003984 ) 
                                            0.01222327 

Outliers seem to be flagged as a problem for this model. The DHARMa documentation indicates that binomial models may have an inflated Type I error rate and to re-run the outlier test with bootstrap to achieve a more exact result.

testOutliers(simulationOutput, type='bootstrap')


    DHARMa bootstrapped outlier test

data:  simulationOutput
outliers at both margin(s) = 54, observations = 11699, p-value = 0.22
alternative hypothesis: two.sided
 percent confidence interval:
 0.001066330 0.007707924
sample estimates:
outlier frequency (expected: 0.00290794084964527 ) 
                                       0.004615779 

Plot residuals against covariates in model

plotResiduals(simulationOutput, form = na.omit(cntydat$`0-4`$RPL_THEMES))

plotResiduals(simulationOutput, form = na.omit(cntydat$`0-4`$Race))

plotResiduals(simulationOutput, form = na.omit(cntydat$`0-4`$sviyear))

plotResiduals(simulationOutput, form = na.omit(cntydat$`0-4`$UrCode))

plotResiduals(simulationOutput, form = na.omit(cntydat$`0-4`$Pop))